Skip to content

fix(mcp): use tiktoken for response-size-guard token estimation#39912

Merged
aminghadersohi merged 2 commits intoapache:masterfrom
aminghadersohi:aminghadersohi/fix-mcp-token-counting
May 7, 2026
Merged

fix(mcp): use tiktoken for response-size-guard token estimation#39912
aminghadersohi merged 2 commits intoapache:masterfrom
aminghadersohi:aminghadersohi/fix-mcp-token-counting

Conversation

@aminghadersohi
Copy link
Copy Markdown
Contributor

SUMMARY

The MCP response-size-guard middleware (ResponseSizeGuardMiddleware) estimates token counts to decide when to truncate or block oversized tool responses. The existing estimator at superset/mcp_service/utils/token_utils.py used a simple char-to-token heuristic (CHARS_PER_TOKEN = 3.5) that miscounts JSON-heavy MCP responses relative to Claude's actual tokenizer. Specific responses could slip past the configured token limit while still being truncated by the Claude Agent SDK's own threshold — the SDK then saved them into a file the model could not read back, causing 120s timeouts in tool calls like get_dataset_info for wide datasets.

This PR switches the estimator to tiktoken's cl100k_base encoding — a real BPE tokenizer with a vocabulary similar to Claude's. For English and JSON-heavy content it tracks Claude's counts within roughly ±10%, which is far closer than any character-ratio heuristic.

The previous heuristic stays as a graceful fallback for environments where tiktoken is not installed; its ratio drops from 3.5 → 3.0 chars/token to be more conservative for JSON content (which under-counted before).

BEFORE/AFTER

estimate_token_count("a 80KB JSON dataset info response")
  before (3.5 chars/token):       ~22,800 tokens (slipped past 25k cap)
  after (tiktoken cl100k_base):   accurate Claude-aligned count

TESTING INSTRUCTIONS

pytest tests/unit_tests/mcp_service/utils/test_token_utils.py -v
pytest tests/unit_tests/mcp_service/test_middleware.py -v

New unit tests cover:

  • tiktoken-loaded path produces non-zero counts
  • bytes input matches string input
  • Length monotonicity (doubling input ≈ doubles count, ±10%)
  • Fallback path when _ENCODING is None (tiktoken not installed) uses len/CHARS_PER_TOKEN
  • Defensive fallback when tiktoken's encode raises — the size guard must never fail-open

ADDITIONAL INFORMATION

  • New dependency: tiktoken>=0.7.0,<1.0 added to the fastmcp extra in pyproject.toml. Anyone installing apache-superset[fastmcp] gets it automatically. requirements/base.txt and requirements/development.txt regenerated via scripts/uv-pip-compile.sh.

  • No network calls: tiktoken is pure offline tokenization. Anthropic's count_tokens API is more accurate but adds a network roundtrip per tool result, which is too expensive for synchronous middleware.

  • Behavioral change: previously-passing token estimates for the same content will now report different (more accurate) numbers. Sites relying on a specific cap will see different effective behavior — typically slightly more conservative truncation for English-text-heavy responses, slightly less for highly repetitive content (BPE compresses repetition).

  • Has associated issue:

  • Required feature flags:

  • Changes UI

  • Includes DB Migration (follow approval process in SIP-59)

  • Introduces new feature or API

  • Removes existing feature or API

The MCP response-size-guard middleware estimates token counts to
decide when to truncate or block oversized tool responses. The
existing estimator used a simple char-to-token heuristic
(CHARS_PER_TOKEN = 3.5) that miscounts JSON-heavy MCP responses
relative to Claude's actual tokenizer:

  estimate_token_count("a 80KB get_dataset_info response")
  - 3.5 chars/token heuristic: ~22,800 tokens
  - tiktoken cl100k_base:        varies by content; closer to truth

This let payloads slip past the 25k token limit while still being
truncated by Claude Agent SDK's own threshold — the SDK saved them
into a file the model couldn't read back, causing 120s timeouts in
the eval suite.

Switch the estimator to tiktoken's cl100k_base encoding (a real BPE
tokenizer with a vocabulary similar to Claude's; tracks Claude's
counts within roughly ±10% for English and JSON-heavy content). The
char-based heuristic stays as a fallback for environments where
tiktoken is not installed; its ratio drops from 3.5 to 3.0 chars per
token to be more conservative for JSON content.

tiktoken is added as a dependency of the existing ``fastmcp`` extra,
so anyone installing ``apache-superset[fastmcp]`` gets it
automatically. requirements/base.txt and requirements/development.txt
are regenerated via scripts/uv-pip-compile.sh.

Tests:
- New unit tests cover the tiktoken-loaded path, the unavailable-
  fallback path, and a defensive fallback when tiktoken's encode
  raises (the size guard must never fail-open).
- One existing middleware test that depended on the old 3.5
  chars/token heuristic is now decoupled by mocking
  estimate_response_tokens directly.
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 6, 2026

Code Review Agent Run #c6406b

Actionable Suggestions - 0
Review Details
  • Files reviewed - 6 · Commit Range: f29b17a..f29b17a
    • pyproject.toml
    • requirements/base.txt
    • requirements/development.txt
    • superset/mcp_service/utils/token_utils.py
    • tests/unit_tests/mcp_service/test_middleware.py
    • tests/unit_tests/mcp_service/utils/test_token_utils.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot Bot added change:backend Requires changing the backend dependencies:python labels May 6, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 0% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.88%. Comparing base (e667ceb) to head (f20e95b).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
superset/mcp_service/utils/token_utils.py 0.00% 21 Missing ⚠️
superset/mcp_service/middleware.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39912      +/-   ##
==========================================
- Coverage   63.88%   63.88%   -0.01%     
==========================================
  Files        2583     2583              
  Lines      136602   136617      +15     
  Branches    31501    31502       +1     
==========================================
  Hits        87274    87274              
- Misses      47812    47827      +15     
  Partials     1516     1516              
Flag Coverage Δ
hive 39.38% <0.00%> (-0.01%) ⬇️
mysql 59.04% <0.00%> (-0.02%) ⬇️
postgres 59.12% <0.00%> (-0.02%) ⬇️
presto 41.07% <0.00%> (-0.01%) ⬇️
python 60.56% <0.00%> (-0.02%) ⬇️
sqlite 58.76% <0.00%> (-0.02%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Address review feedback on apache#39912: the lazy imports of
estimate_response_tokens, format_size_limit_error, INFO_TOOLS, and
truncate_oversized_response inside ResponseSizeGuardMiddleware methods
made the conventional patch path
(superset.mcp_service.middleware.estimate_response_tokens) raise
AttributeError because those names didn't exist on the middleware
module. Moving the imports to module level makes patching at
"where it's used" work as expected, which is the standard mock
convention.

Tests now patch superset.mcp_service.middleware.estimate_response_tokens
directly rather than the upstream definition module.
@aminghadersohi
Copy link
Copy Markdown
Contributor Author

Addressed in f20e95b.

Root cause: the middleware was lazy-importing estimate_response_tokens, format_size_limit_error, INFO_TOOLS, and truncate_oversized_response inside the methods, so those names never existed on the superset.mcp_service.middleware module — patching the conventional superset.mcp_service.middleware.estimate_response_tokens path raised AttributeError, which is why I had originally targeted the definition module.

Fix: hoisted all four imports to module level. Now superset.mcp_service.middleware.estimate_response_tokens exists as a normal module attribute and the test patches it where it's used, the way the convention assumes.

Not strictly needed but the right call — keeps tests robust against future churn and gets rid of three pointless local imports.

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 7, 2026

Code Review Agent Run #dfd6f2

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: f29b17a..f20e95b
    • superset/mcp_service/middleware.py
    • tests/unit_tests/mcp_service/test_middleware.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@aminghadersohi aminghadersohi merged commit 9b52031 into apache:master May 7, 2026
65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants